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Abstract 

I I We study the Lobby-Hirsch index [h-index for short) as a local node cen- 

trality measure for complex networks. The h-index is compared with degree 
(a local measure), betweenness and Eigenvector centralities (two global mea- 
O sures) in the case of a biological network (Yeast interaction protein-protein 

network) and a linguistic network {Moby Thesaurus II). In both networks, 
^ the h-index has poor correlation with betweenness but correlates with degree 

QQ and Eigenvector. Being a local measure, one can take advantage by using 

^^ the h-index because it carries more information about its neighbors when 

^ compared with degree centrality, indeed it requires less time to compute 

^ when compared with Eigenvector centrality. Results suggests that h-index 

O produces better results than degree and Eigenvector measures for ranking 

^ purposes, becoming suitable as a tool to perform this task. 

>■ Keywords: Lobby-Hirsch index. Lobby index, centrality, degree, 

K> betweenness. Eigenvector 

1. Introduction 

The Hirsch index has been thoroughly studied for scientometrics pur- 
poses. It has been applied to networks of individual researchers collabora- 
tion [H [21 El mE] , research groups |6], journals [T,'8] and countries [9] obtained 
from database of citations. In this context, the h-index is the largest integer 
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h such that a node from a given network has at least h neighbors which have 
a degree of at least h [1] . 

Korn et al. [10] have proposed a general index to network node centrality 
based on the /i-index. Korn et al. named it as Lobby index, but since it is 
simply the application of Hirsch's idea in the context of general networks, 
we call it Lobby-Hirsch ^-index. Korn et al. argue that the proposed index 
contains a mix of properties of other well known centrality measures. How- 
ever, they have studied it mainly in the context of artificial networks like the 
Barabasi-Albert model pTj . 

Like /i-index, degree D is a local centrality measure that is equal to the 
number of links of a given node. If the network is directed, the number of 
outlinks is the outdegree and the number of inlinks is the indegree. Unlike 
/i-index, betweenness and Eigenvector are global centrality measures that 
take into account all nodes in the network. The betweenness i? of a given 
node is proportional to the number of geodesic paths (minimal paths between 
node pairs in the network) that pass through it. It seems to be an important 
measure for networks where such minimal paths represent transport channels 
for information (internet, social networks), energy (power grids), materials 
(airports network) or diseases (social and sexual networks). Eigenvector cen- 
trality of a node is proportional to the sum of the centralities of the nodes 
to which it is connected, a is the largest eigenvalue oi A = aij and n the 
number of nodes 



n 



i=i 

In this paper, we compare the h-index with degree, betweenness and 
Eigenvector centralities applied to associative (non-transport) networks to 
obtain the correlation between these measures. 

2. Methods 

We calculate the h-index, degree D, betweenness B and Eigenvector E 
centralities for the nodes in linguistic and biological networks already consid- 
ered by the physics community. We also plot the dispersion of D versus h, 
B versus h and E versus h, to verify the correlation between these measures. 

The linguistic database used was the Moby Thesaurus II |13| composed 
by 30,260 words, and which some network properties was already studied [H 



[T5] . To construct the network, we use the convention that an outhnk goes 
from a root word to a synonym. As an example, in the entry 

set , assign, assign to, assigned, . . . 

the word "set" is the root and the hnk goes to its synonyms. We obtain the 
directed hnks "set" — )■ "assign" , "set" — )■ "assign to" and "set" — )■ "assigned" . 

The raw thesaurus have over 2.5 milhon hnks, but there are many words 
with only inlinks, that is, they are not root words. We worked with a filtered 
version containing about 1.7 million links where only root words constitute 
nodes. The minimal number of outlinks is 17 and the maximum is 1,106. 
Notice that the graph is directed [15], but we have used as centrality measures 
the outdegree and the h-mdex is based on the outdegree. 

The biological network is the yeast protein-protein network downloaded 
from the BioGRID repository [16]. This is a curated repository for physical 
and genetic interactions for 5,433 proteins and over 150,000 unambiguous 
interactions. 

The BioGRID network is composed by gene products connected by a 
link [in]. The links include direct physical binding of two proteins, co- 
existence in a stable complex or genetic interaction as given by one or several 
experiments described in the literature. As an example, using the entries 

YFL039C YBR243C 
YFL039C YKL052C 

extracted from BioGRID data set, two links are created: "YFL039C" - "YBR243C" 
and "YFL039C" - "YKL052C" , and the network is undirected. 
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3. Results 

3.1. Local measure: degree 

In Figure [T| we present dispersion plots of the h versus D for the networks 
studied. The /i-index is correlated with D {h oc D) in the low D regime 
{D < 100) in both networks. But for higher D, we observe that the highest 
h are proportional to D^'^ for both networks, but the origin of this anomalous 
exponent is not clear. We notice that, although correlated, the two measures 
are not redundant. In the thesaurus case the, words with low frequency of 
use or non-polysemous present low h but high degree. 



10-^ 



10' 



10" 



I I I iiiii| — I I I iiiii| — I I iJiiii| — r 




•h = 15.4*D 
•h = D 



I I I iiiii| — I I I iiiii| — I I I iiiii| — r 

10° io' lo" lo" 



I I I iiiii| — I I I iiiii| — I I iJiiiii — r 




■h = 14*D 
■h = D 



I I I iiiii| — I I I iiiii| — I I I iiiii| — r 

10° io' io' lo" 



D 



Figure 1: Log-log dispersion plot of h versus Degree centrality D for a) Moby Thesaurus 
II and b) Yeast network. 



3.2. Global measures: betweenness and Eigenvector 

We now compare the h-index with two standard global centrality mea- 
sures, betweenness and Eigenvector. First, we present in Figure |2] the dis- 
persion plots of h versus B. The h-index presents no strong correlation with 
B in both networks. 

In Figure [3} we give the dispersion plot for the h-index versus the Eigen- 
vector centrality E for the thesaurus network. In the high E regime the 
maximal h values is bounded by ^ oc E^-"^, as in the h versus D plot. We ob- 
serve several nodes with high E but relatively low h (see Inset). Examining 
individually these nodes, we find that h seems to outperform E in the ranking 
task, since words with high h also have high E and are basic and important 
polysemous words. In contrast, terms with high E can have high or low 
h. Those with low h are mostly phrasal verbs or multiple word expressions 
derived from the words with high h. 

It is difficult to qualify a ranking list, but the above effect is very clear, 
as can be observed in Table [I] (see Appendix) that shows the top 25 words 
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Figure 2: Log-log dispersion plot of h versus Betweenness B for a) Moby Thesaurus II 
network and b) Yeast network. 



ranked by h and E, and the same occurs for other high E and low h words. 
In the case of the Yeast protein network, we observe a strong correlation 
between h and E ioi E > 0.2. The highest h seem also to be bounded by a 
h oc E^-'^ behavior. Also, results suggests that the h-index could outperform 
E in the task of classifying relevant nodes. We observe a detaching cluster of 
nodes with low E and moderate h (see Figure 111). It is very interesting that all 
these nodes seem to pertain to ribosome proteins, meaning that the h-index 
carries information that can be useful for detecting modules of functionally 
related proteins. 



4. Discussion 

In the regime relevant for ranking purposes, the biological network data 
shows a strong correlation between the Eigenvector and Lobby-Hirsch cen- 
tralities, although the computation of the Lobby-Hirsch index is much less 
demanding because it is not iterative and uses only local information. This 
suggests that the h centrality can be useful for ranking purposes in large 
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Figure 3: Log-log dispersion plot h versus Eigenvector centrality E for the Moby Thesaurus 
II. Inset: Linear scale, notice the several words with high E but low h. 



databases with results comparable with Eigenvector centrality. This claim 
could be tested in the paper citation network studied by Chen et al. [TTj 
where the Page-Rank algorithm, which core is the Eigenvector centrality 
concept, has given interesting results. 

Local measures, as h-index, seem to make more sense for non-transport 
networks where path distance or channel flux has little influence and are not 
important aspects to deflne centrality [18]. The same does not occur with 
some global measures where path distance must be taken into account. Being 
local, h-index requires 0{D) time to compute which is always less than the 
0{NL) required to calculate B using Brandes' algorithm [12], where D is the 
degree of a given node, A^ is the number of nodes and L is the number of 
links of a given network. As h-index requires less computational time than 
E {0{N)), the high correlation between the two measures showed for the 
highest ranks suggests that the h-index could be very suitable for ranking 
tools and search engines. 

Both centrality measures make sense for studying diffusion and epidemic 
processes in transport networks, but the relevance of minimal paths is not so 
clear for linguistic or cultural networks like thesauri or, as another example, 
the network of cultural culinary recipes studied by Kinouchi et al. [2^ where 
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Figure 4: Log-log dispersion plot of h versus E for the Yeast network. The h and E 
centralities are well correlated for E > 0.2 where there is a /i oc E^-^ bound for the highest 
h values. Inset: linear scale, notice the cluster of high h but low E ribosome proteins. 



links of ingredients represent associations but not channels. For networks 
similar to the linguistic one studied here, there is a strong decay of correla- 
tions: two words A and C with minimal path of two links (that is, A — B — C) 
are almost uncorrelated, since this means that C is not a word semantically 
related to A. The paths between words may be relevant to describe perhaps 
associative psychological processes (say, A remembers B that remembers C), 
but they are not channels in the same sense of physical transport networks. 
So, the locality of the /z.-index could be an advantage to its application for 
ranking nodes in non-transport networks where path distance or channel flux 
has poor relevance and are not important aspects to define centrality [TS] . 
We notice that this could be the case of web pages since links represent more 
associations than channels and users do not navigate from link to link by 
large distances. 



5. Conclusions 

In conclusion, we studied the /i-index in the Moby II Thesaurus and 
the protein-protein interaction Yeast networks. Several characteristic of this 



centrality index have been highlighted. The h-index seems to be a better 
local measure than the node degree D because it incorporates information 
about the importance of the node neighbors. Being local, h-index requires 
0{D) time to compute that is always less than 0{N) required to compute 
E and 0{NL) time to compute B. 

We also found that the h-index is more correlated to Eigenvector cen- 
trality than Betweenness centrality. Indeed, in the ranking task for words in 
the thesaurus, h-index seems even to outperform the E a.s a centrality index, 
detecting basic polysemous words instead of words with low frequency of use 
or non-polysemous. 

Since Eigenvector centrality corresponds to the core idea behind the orig- 
inal Page- Rank algorithm pTj, which is computationally very demanding, 
we suggest that the h-index could furnish auxiliary information for ranking 
pages in the area of Search Engine Optimization. Due to the fact that h- 
index requires less time to compute when compared with standard global 
centrality measures, its use in other physical, biological and social networks 
promises very interesting results. 
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Appendix 



h rank 








E rank 
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Eigenvector 


Word 
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Word 


252 


0,930 


cut 




1,000 


74 


cut up 


237 


0,701 


set 




0,930 


252 


cut 


233 


0,608 


run 




0,765 


31 


set upon 


232 


0,687 


line 




0,760 


230 


turn 


230 


0,760 


turn 




0,701 


237 


set 


225 


0,598 


point 




0,690 


106 


break up 


222 


0,608 


cast 




0,687 


232 


line 


220 


0,584 


break 




0,656 


54 


line up 


218 


0,560 


mark 




0,649 


12 


run wild 


216 


0,558 


measure 




0,637 


57 


turn upside down 
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0,597 


pass 




0,618 


112 


make up 


211 


0,570 


clieck 
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45 


cast up 


209 


0,487 


crack 
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222 


cast 


206 


0,562 


make 




0,608 


233 


run 
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dash 
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crack up 
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stamp 
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check out 
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work 
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point 
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strain 
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pass 
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Irold 
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220 


break 
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0,508 


form 
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61 


pass up 
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beat 




0,570 


211 


check 


193 


0,500 


get 




0,562 


206 


make 


193 


0,429 


rank 




0,560 


218 


mark 


193 


0,469 


round 




0,558 


73 


fix up 


192 


0,517 


go 




0,558 


216 


measure 



Table 1: Top 25 words ranked by Lobby- Hirsch (/i) centrality (left) and by Eigenvector 
centrality (right). 
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